A greedy pursuit approach to classification using multi-task multivariate sparse representations
نویسندگان
چکیده
In this report, we propose an extension of the well-known Simultaneous Orthogonal Matching Pursuit (SOMP) algorithm to solve a multi-task multivariate classification problem using sparse representations. This corresponds to the case where an event is described by multiple representations, and separate training dictionaries are designed for each such representation. I. NOTATION Let yi ∈ R, i = 1, . . . , T be T different representations1 of the same physical event, which is to be classified into one of C different classes. Let Y := [y1 . . . yT ] ∈ Rm×T . Assuming n training samples/events in total, we design T dictionaries Di ∈ Rm×n, i = 1, . . . , T , corresponding to the T representations. We define a new composite dictionaryD := [D1 . . . DT ] ∈ Rm×nT . Further, each dictionary Di is represented as the concatenation of the sub-dictionaries from all classes corresponding to the i-th representation of the event: Di := [D 1 i D 2 i . . . D C i ], (1) where Dji represents the collection of training samples for representation i that belong to the j-th class. So, we have: D := [D1 . . . DT ] = [D 1 1 D 2 1 . . . D C 1 . . . D 1 T D 2 T . . . D C T ]. (2) An important assumption in designing D is that the k-th column from each of the dictionaries Di, i = 1, . . . , T , taken together offer multiple representations of the k-th training sample/event. II. MULTI-TASK MULTIVARIATE SPARSE REPRESENTATIONS A test event Y can now be represented as a linear combination of training samples as follows: Y = [y1 . . . yT ] = DS = [ D1 D 2 1 . . . D C 1 . . . D 1 T D 2 T . . . D C T ] [α1 . . . αT ] , (3) where the coefficient vectors αi ∈ R , i = 1, . . . , T , and S = [α1 . . . αT ] ∈ RnT×T . We examine the structure of the coefficient matrix S and make some crucial observations. • It is reasonable to assume that the i-th representation of the test event (i.e. yi) can be approximately represented by the linear span of the training samples belonging to the i-th representation alone (i.e. only those training samples in Di). So the columns of S have the following structure: each vector αi has non-zero coefficients only in the locations corresponding to the columns of Di and has zeros elsewhere. As a result, S exhibits block-diagonal structure. • Each representation yi of the test event is a sparse linear combination of the training samples in Di. Suppose the event belongs to class c ∈ {1, . . . , C}; then only those coefficients in αi that correspond to Di are expected to be non-zero. • Furthermore, the non-zero weights of training samples in the linear combination exhibit one-to-one correspondence across representations. If the j-th training sample from the c-th class in D1 has a non-zero contribution to y1, then for all i = 2, . . . , T , yi has non-zero contributions from the j-th training sample of the c-th class in Di. This suggests a joint sparsity model similar to the model introduced in [1]. However, the multi-task nature of the problem with different dictionaries Di does not permit us to apply the SOMP algorithm from [1] directly. Since S obeys column correspondence, we introduce a new matrix S ′ ∈ Rn×T as the transformation of S with the zero coefficients removed, S ′ = α 1 1 . . . α 1 i α 1 T .. .. .. .. .. α1 . . . α C i . . . α C T , where αji refers to the sub-vector extracted from αi that corresponds to coefficients from the j-th class. Note that, in the i-th column of S ′, only the coefficients corresponding to Di are retained (for i = 1, . . . , T ). 1The term multi-task is used to refer to these multiple representations in some application domains such as heterogeneous sensor fusion. TECHNICAL REPORT PSUEE-TR-0112 2 We can now apply row-sparsity constraints similar to the approach in [1]. Our modified optimization problem becomes: Ŝ ′ = arg min S ′ ‖S ‖row−0 subject to ‖Y −DS‖F ≤ , (4) for some tolerance > 0. We minimize the number of non-zero rows, while the constraint guarantees a good approximation. The matrix S can be transformed into S ′ by introducing matrices H ∈ RnT×T and J ∈ Rn×nT , H = diag [1 1 . . . 1] ,J = [In In . . . In] , where 1 ∈ R is the vector of all ones, and In denotes the n-dimensional identity matrix. Finally, we obtain S ′ = J (H ◦S), where ◦ denotes the Hadamard product, (H ◦S)ij , hijsij for all i, j. III. EXTENSION OF SOMP FOR MULTI-TASK MULTIVARIATE SPARSE REPRESENTATIONS Eq. (4) represents a hard optimization problem due to presence of the non-invertible transformation from S to S ′. We bypass this difficulty by proposing a modified version of the SOMP algorithm for the multi-task multivariate case. Recall that the original SOMP algorithm gives K distinct atoms (assuming K iterations) from a dictionary D that best represent the data matrix Y . In every iteration k, SOMP measures the residual for each atom in D and creates an orthogonal projection with maximal correlation. Extending this to the multi-task setting, for every representation i, i = 1, . . . , T , we can identify the index set that gives the highest correlation with the residual at the k-th iteration as follows: λi,k = arg max j=1,...,n T ∑ q=1 wq ∥∥Rtk−1dq,j∥∥p , p ≥ 1, where wq denotes the weight (confidence) assigned to the q-th representation, dq,j represents the j-th column of Dq, q = 1, . . . , T , and the superscript (·) indicates the matrix transcript operator. After finding λi,k, we modify the index set to: Λi,k = Λi,k−1 ⋃ λi,k, i = 1, . . . , T. Thus, by finding the index set for the T distinct representations, we can create an orthogonal projection with each of the atoms in their corresponding representations. The algorithm is summarized below in Algorithm 2. Algorithm 1 SOMP for multi-task multivariate sparse representation-based classification Input: Dictionary D as defined in Section I, signal matrix Y , number of iterations K Initialization: residual R0 = Y , index set Λ0 = φ, iteration counter k = 1 while k ≤ K do (1) Find the index of the atom that best approximates all residuals: λi,k = arg max j=1,...,n ∑T q=1 wq ∥∥Rtk−1dq,j∥∥p , p ≥ 1 (2) Update the index set Λi,k = Λi,k−1 ⋃ {λi,k} , i = 1, . . . , T (3) Compute the orthogonal projector pi,k = ( DtΛi,kDΛi,k )−1 DtΛi,kyi, for i = 1, . . . , T , where DΛi,k ∈ R n×k consists of the k atoms in Di indexed in Λi,k (4) Update the Residual Matrix Rk = Y − [ DΛ1,kp1,k . . . DΛT,kpT,k ] (5) Increment k: k ← k + 1 end while Output: Index set Λi = Λi,K , i = 1, . . . , T ; sparse representation Ŝ ′ whose non-zero rows indexed for each representation by Λi, i, i = 1, . . . , T , are the K rows of the matrix ( DtΛi,KDΛi,K )−1 DtΛi,KY .
منابع مشابه
The Analysis of Sparse Representations for the Sequence of Images of Videos
Sparse representation has become very popular in fields of signal processing, image processing computer vision and pattern recognition. Sparse representation also has good reputation in both theoretical and practical applications. Images can be sparsely coded by structural primitives and recently the sparse coding or sparse representation has been widely used to resolve the problems in image re...
متن کاملMMSE Approximation for Denoising Using Several Sparse Representations
Cleaning of noise from signals is a classical and long-studied problem in signal processing. For signals that admit sparse representations over a known dictionary, MAP-based denoising seeks the sparsest representation that synthesizes a signal close to the corrupted one. While this task is NP-hard, it can usually be approximated quite well by a greedy method, such as the Orthogonal Matching Pur...
متن کاملImage Classification via Sparse Representation and Subspace Alignment
Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...
متن کاملSparse Recovery
List of included articles [1] H. Rauhut. Random sampling of sparse trigonometric polynomials. Appl. Comput. [2] S. Kunis and H. Rauhut. Random sampling of sparse trigonometric polynomials II-orthogonal matching pursuit versus basis pursuit. [3] H. Rauhut. Stability results for random sampling of sparse trigonometric polynomi-als. [4] H. Rauhut. On the impossibility of uniform sparse reconstruct...
متن کاملInferring sparse representations of continuous signals with continuous orthogonal matching pursuit
Many signals, such as spike trains recorded in multi-channel electrophysiological recordings, may be represented as the sparse sum of translated and scaled copies of waveforms whose timing and amplitudes are of interest. From the aggregate signal, one may seek to estimate the identities, amplitudes, and translations of the waveforms that compose the signal. Here we present a fast method for rec...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012